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Let ^ 0 , ^ 1 , ..., be observations from the hidden Markov model 
with probability distribution , and let ■ ■ ■ be observa¬ 

tions from the hidden Markov model with probability distribution 
. The parameters 6o and 6i are given, while the change point ui 
is unknown. The problem is to raise an alarm as soon as possible 
after the distribution changes from to , but to avoid false 
alarms. Specifically, we seek a stopping rule N which allows us to 
observe the ^'s sequentially, such that E^oN is large, and subject to 
this constraint, supf. Ek{N — k\N > k) is as small as possible. Here 
Ek denotes expectation under the change point k, and Poo denotes 
expectation under the hypothesis of no change whatever. 

In this paper we investigate the performance of the Shiryayev- 
Roberts-Pollak (SRP) rule for change point detection in the dynamic 
system of hidden Markov models. By making use of Markov chain 
representation for the likelihood function, the structure of asymptot¬ 
ically minimax policy and of the Bayes rule, and sequential hypothesis 
testing theory for Markov random walks, we show that the SRP pro¬ 
cedure is asymptotically minimax in the sense of Poliak [Ann. Statist. 

13 (1985) 206-227]. Next, we present a second-order asymptotic ap¬ 
proximation for the expected stopping time of such a stopping scheme 
when oj = 1. Motivated by the sequential analysis in hidden Markov 
models, a nonlinear renewal theory for Markov random walks is also 
given. 


1. Introduction. The problem of quick detection, with low false-alarm 
rate, of abrupt changes in stochastic dynamic systems arises in a variety of 
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applications, including industrial quality control, segmentation of signals, fi¬ 
nancial engineering, biomedical signal processing, edge detection in images, 
and the diagnosis of faults in the elements of computer communication net¬ 
works. A comprehensive summary in this area was given by Basseville and 
Nikiforov (1993) and Lai (1995, 2001). A typical such problem in segmen¬ 
tation of signals is that of using an automatic segmentation of the signal 
as the hrst processing step, and a segmentation algorithm splits the signal 
into homogeneous segments, the lengths of which are adapted to the local 
characteristics of the analyzed signal. The main desired properties of a seg¬ 
mentation algorithm are few false alarms and missed detections, and low 
detection delay. In the standard formulation of the change point detection 
problem, there is a sequence of observations whose distribution changes at 
some unknown time u), and the goal is to detect this change as soon as 
possible under false alarm constraints. The reader is referred to Braun and 
Muller (1998) for a nice discussion of hidden Markov models for DNA data 
and change point detection analysis. 

When the observations are independent with a common density func¬ 
tion f^° for n <io and with another common density function for n>uj, 
a minimax formulation has been proposed by Lorden (1971), in which he 
showed that subject to the “average run length” (ARL) constraint, Page’s 
CUSUM procedure asymptotically minimizes the “worst case” detection de¬ 
lay. Instead of studying the optimal detection problem via sequential test¬ 
ing theory, Moustakides (1986) formulated the worst case detection delay 
problem subject to an ARL constraint as an optimal solution to the opti¬ 
mal stopping problem. Ritov (1990) later gave a simpler proof. For change 
point detection in complex dynamic systems beyond the i.i.d. setting, Bansal 
and Papantoni-Kazakos (1986) extended Lorden’s asymptotic theory to the 
case where are stationary ergodic sequences, under the condition that 
< u)} (before the change point) and {^j, j > oj} (after the change point) 
are independent, and proved the asymptotic optimality of the CUSUM algo¬ 
rithm. Further extensions to general stochastic sequences were obtained 
by Lai (1995, 1998). Moreover, using a change of measure argument, Lai 
(1998) also established the asymptotic optimality of the CUSUM rule under 
several alternative performance criteria. In the dynamic system of hidden 
Markov models, Fuh (2003) proved that the CUSUM scheme is asymptoti¬ 
cally optimal in the sense of Lorden (1971). His method related the CUSUM 
procedure to certain one-sided sequential probability ratio tests in hidden 
Markov models, for which they had been shown, in Section 4 of Fuh (2003), 
to be asymptotically optimal for testing simple hypotheses. 

In the simple system of independent observations before and after the 
change, a Bayesian formulation has been proposed by Shiryayev (1963, 
1978), in which the change point is assumed to have a geometric prior distri¬ 
bution, and the goal is to minimize the expected delay subject to an upper 
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bound on the false alarm probability. He used optimal stopping theory to 
show that the Bayes rule triggers an alarm as soon as the posterior prob¬ 
ability that a change has occured exceeds some hxed level. Roberts (1966) 
considered the non-Bayesian setting, and studied by simulation the average 
run length of this rule, and found it to be very good. Poliak and Siegmund 
(1975) extended Shiryayev’s work in a non-Bayesian setting. And Poliak 
(1985) showed that the (modihed) Shiryayev-Roberts rule is asymptotically 
minimax under the formulation of Poliak and Siegmund (1975). Later Yakir 
(1997) proved that the procedure is strictly optimal for a slight reformula¬ 
tion of the problem. Finally, we mention that Yakir (1994) studied Bayesian 
optimal detection for a hnite state Markov chain. 

As noted by Basseville and Nikiforov (1993) in their monograph, there 
is a great deal of literature on detection algorithms in complex systems 
but relatively little on the statistical properties and optimality theory of 
detection procedures beyond very simple models. The primary goal of this 
paper is to investigate theoretical aspects of the Shiryayev-Roberts-Pollak 
(SRP) change point detection rule in hidden Markov models. We show that 
the SRP procedure is asymptotically minimax in the sense of Poliak (1985). 
Next, we present a second-order asymptotic approximation for the expected 
stopping time of such a stopping scheme when cu = 1. Motivated by the 
sequential analysis in hidden Markov models, a nonlinear renewal theory for 
Markov random walks is also given. 

This paper is organized as follows. In Section 2 we dehne the hidden 
Markov model and formulate the sequential change point detection prob¬ 
lem. Then we provide a Markov chain representation of the likelihood ratio. 
A nonlinear Markov renewal theory is given in Section 3. In Section 4 we 
show that the SRP rule is asymptotically minimax under mild conditions. 
In Section 5 we study the asymptotic operating characteristics of the detec¬ 
tion procedure, and derive a second-order asymptotic approximation for the 
expected stopping scheme when cu = 1. All proofs are given in Sections 6, 7 
and 8. 

2. Problem formulation. A hidden Markov model is dehned as a parame¬ 
terized Markov chain in a Markovian random environment [Fuh (2003)], with 
the underlying environmental Markov chain viewed as missing data. That is, 
for each 9 € Q C the unknown parameter, we consider X = {Y„,n > 0} 
as an ergodic (positive recurrent, irreducible and aperiodic) Markov chain 
on a hnite state space D = {1,2,. .. ,d}, with transition probability matrix 
P{0) = \Pxy{0)]x,y=i,...,d and stationary distribution tt{ 6) = {TTxi9))x=i,...,d- 
Suppose that an additive component taking values in R, is adjoined to 
the chain such that {(Y„,^„),n > 0} is a Markov chain on D x R, satisfy¬ 
ing E AjYo = x,.^o = s} = e ^1^0 = x} for A E B(D). And 
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conditioning on the full X sequence, is a Markov chain with probability 
P^{^n+i e B\Xo,Xi, • ,?n} 

( 2 . 1 ) 

= P^{Cn+l £ B\Xn+l-, ^n} = {^n+l ■Cn,B) a.S. 

for each n and B G B{R), the Borel cr-algebra of R. Note that in (2.1) the 
conditional probability of ^n+i depends on Xn+i and only. Furthermore, 
we assume the existence of a transition probability density for the Markov 
chain {(X„,^„),n > 0} with respect to a u-finite measure fi on R such that 

G M, 6 G B\Xo = X, ^0 = so} 

(2 2) = e ^1^1 G ^0 = X, ^0 = so}P^^HXi G ^|Xo = xj 

= J2 [ Pxyi^)f{s]V’yi0)\so)dn{s), 
y&A'’^ 

where f{ik]Px^{d)\£,k-i) is the transition probability density of ^k given 
^k-i a-nd Xfc with respect to /i, 0 G 0 is the unknown parameter, and (py{-) 
is a function defined on the parameter space 0 for each y = 1,... ,d. Here and 
in the sequel we assume the Markov chain {(X„,^„),n > 0} has stationary 
probability F with probability density Trx{6)f{-‘,(px{d)) with respect to y. In 
this paper we assume that only one parameter is of interest and treat the 
other parameters as nuisance parameters. That is, for simplicity we consider 
0 G 0 C i? as a one-dimensional unknown parameter. For convenience of 
notation, we write tTx for tTx{0) and pxy for Pxyid)- We call a process n > 
0} a hidden Markov model if there is a Markov chain {X„,n > 0} such that 
the process {(X„, ^„), n > 0} satisfies (2.1) and (2.2). 

Let be the observations from the hidden Markov model 

with distribution , and let .. ■ be the observations 

from the hidden Markov model {^n,n > 0} with distribution Both 6q 
and 9i are given, while the change point lv is unknown. We shall use P^i 
to denote such a probability measure (with change time lo) and use Poo to 
denote the case to = oo (no change point). Denote as the correspond¬ 
ing expectation under P^^. The objectives are to raise an alarm as soon as 
possible after the change and to avoid false alarms. A detection scheme is 
a stopping time on the sequence of observations and aims to minimize the 
number of post change observations. Hence, the stopping time N should 
satisfy {N > uj} but, at the same time, keep N — u) small. In this paper we 
use the functional studied by Poliak and Siegmund (1975) and Poliak (1985) 
to find a stopping time N to minimize 

sup Ej.{N — k\N>k) 

l<fc<oo 


(2.3) 
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subject to 


(2.4) 


EooN > 7, 


for some specified (large) constant 7 . A detection scheme is called asymptot¬ 
ically minimax if it minimizes (2.3), within an o(l) order, among all stopping 
rules that satisfy EooN > 7 , where o(l) —> 0 as 7 ^ 00 . 

To describe the SRP change point detection scheme, we need the following 
notation. Fix 9o,0i S 0. Let be the observations given from the 

hidden Markov model > 0}. Denote 

_ Pnj^Oi ^ 1 ; ■ ■ ■ ; ^ 1 ) 

- ■ ■ ,^n;0o) 


(2.5) 


■= "'Jl '^xo{0i)f{(o;^xo{0i)) 

a;o=l Xn=l 

n 

X Y[Pxi_:,xi{dl)f{^i;(Pxi{0lMl-l) 

1=1 


X 


d 

E 

-X0 = 1 


d 

■ E '^xo{^o)fi^O',Pxoi^o)) 

IE 72 = 1 


^ J_ Pxi_ixi (^ 0 )/Pxi (^ 0 ) I^Z— 1 ) 

1=1 


as the likelihood ratio. For 0 < fc < n, let 


j^j^k Pn{,^ki^k+li • • • ; ^ 1 ) 

^2 Q'j Pn(Cfc) ^fc+ 1 ) • • • 5 ^ 0 ) 

Ei=i • • • Sl=i U?=kPx,.,x, {0i)f{^i;Px, jOiMi-i) 

Ei=i • • • Et=i U?=kPx,.^x, (0o)/(6; iOoMi-i)' 

Given an approximate threshold R > 0 and setting 6 = logR, define the 
Shiryayev-Roberts scheme 


(2.7) 


N), := inf < n: ^ LR^ > 7? > = inf< n: log ^ LR^ > b 


k=0 


k=0 


A simple modification of (2.7) was given by Poliak (1985) by adding a ran¬ 
domization on the initial LR^. This will be defined precisely in Section 4. 

It is worth asking that while the SRP rule (2.5)“(2.7) is asymptotically 
minimax in the i.i.d. cases [Poliak (1985)], is it nontrivial whether this is still 
true for hidden Markov models? To give a definitive answer to this question, 
we need to study the likelihood ratio LRn that appeared in (2.5) since (2.6) 
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can be analyzed in the same manner. Note that the nonadditive form of 
(2.5) makes it difficult to analyze. A key idea to get rid of this difficulty is 
to represent the likelihood ratio (2.5) as the ratio of Li-norms of products 
of Markov random matrices. This device has been proposed by Fuh (2003) 
to study SPRT and CUSUM for hidden Markov models. Here, we carry out 
the same idea to have a representation of the likelihood ratio LR^. 

Given a column vector u = (ui,..., u^Y G R‘^, where t denotes the trans¬ 
pose of the underlying vector in R'^, define the Li-norm of u as 
Ui\. The likelihood ratio (2.5) can be represented as 

Pn(Co,ei, • • • ,en; 0l) \\Mn{0i) • • • M^{d,)Mo{e^)7r{d,)\\ 


m = 


E d 
2 = 1 


( 2 . 8 ) LRn= . 

PnisO? si 5 • • • ? Sn 

where, for 0 = 9q or 9i, 


■,9o) \\MM---MY9o)Mo{9o)7r{9o)\\ 


(2.9) 


Mo = Moi9) = 




0 


( 2 . 10 ) Mk = Mk{9) 

for /c = 1 ,..., re, and 

( 2 . 11 ) 


'pii{(^)f{Ck;^i{9)\Ck-i) • • • Pdi{9)f{^k;pi{9)\Ck-i)' 
.Pid{9)f{^k]Pd{9)\Ck-i) ■ ■ ■ Pdd{9)f{Ck]Pd{9)\Ck-i). 

tt{ 9) = {■Ki{9),...,-Kd{9)Y. 


Let {(A„,,^„),re > 0} be the Markov chain defined in (2.1) and (2.2). De¬ 
note Yn := {Xn,^n) and D' := D x R. Dehne Gl{d,R) as the set of invertible 
dxd matrices with real entries. For given fe = 0,1,..., re, and 9 = 9q or 9i, let 
M}^{9) be the random matrix from D' x D' to Gl{d,R), as dehned in (2.9) 
and (2.10). For convenience of notation, we still denote 9 = {9o,9i) and let 


Tn( 0 ) 

( 2 . 12 ) 


Mn{9)---Mo{9) 

{Tn{9o),Tn{9i)) = {Mn{9o) • • • Mq( 0o), (0i) • • • Mo{9i)). 


Then the system {(T^, T„(0)), re > 0} is called a product of Markov random 
matrices on D' x Gl{d,R) x Gl{d,R). Denote Vy as the probability distri¬ 
bution of {{Yn,Tn{9)),n > 0} with Yq = y, and T® as the expectation under 

' y 

Let u G R’^ be a d-dimensional vector, re : = re/||re|| the normalization of re 
(||re|| Y 0)) and denote P{R'^) as the projection space of R^ which contains 
all e lements re. For given re £ P {R^) and M G Gl{d,R), denote M ■ u = Mu 
and Tk{9)u = (Tk{9o)u,Tk{9i)u), for k = 0,... ,n. Let 

(2.13) < = (yo,T^), Wf = {Yi,TY{^),...,W^ = {Yn,%m^). 
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Then {W^, n > 0} is a Markov chain on the state space D' x P{R^) x P{R^) 
with the transition kernel 


(2.14) ^\{y,u),A X B) := Mi(0)n)) 


for all yeD',u := {u,u) E P{R<^) x P{R'^), A E B(V'), and B E B{P{R^) x 
P{R'^)), the Borel cr-algebra of P{R'^) x P{R‘^). For simplicity we let := 

P®(-,-) and denote as the expectation under Since the Markov 

chain {{Xn,^n),n > 0 } has transition probability density and the random 
matrix Mi{6) is driven by {(X„,^„),n >0}, it implies that the induced 
transition probability P(-, •) has a density with respect to y,. Denote it as P 
for simplicity. Under Condition C given below, the Markov chain {W^,n > 

0 } has an invariant probability measure mP on D' x P{R'^) x P{R'^)] see Fuh 

(2003). __ 

Now, for yo) Hi £ D', u = u{9) = {u{6o),u{9i) ) E P{R^) x P{R^) and M = 
M{yo,yi) = M{9) = {M{9 q),M{9i)) E Gl{d, R) x Gl{d, R), let a:{D'x P{R<^) x 
PiR^)) X (D' X PiR^) X PiRd)) -^Rhe a((yo, n), {yuJfH)) = log ||gj^;j:g;ijj^jj$jjj 
For 7r{9o),7r{9i) £ P{R<^), denote a{Wo, Wo) = log ||?°jg;}"je^j||^||^je^j|| • Then 


(2.15) 


log LRn 

^ ||M„(gi)---Mi(gi)Mo(gi)7r(gi)|| 

°^||M„(0o)---Mi(0o)Mo(0o)7r(0o)|| 

||T4gi)7r(gi)||/||T„_i(gi)7r(gi)|| 

^||T40o)vr(0o)||/||Tn-i(0o)^(0o)|| 


||Ti(g07r(gi)||/||To(gi)7r(gi)|| 

®||Ti(0o)vr(0o)||/||To(0o)vr(0o)|| 

||To(gi)vr(gi)||/||vr(gi)|| 

^||To(0o)vr(0o)||/||vr(0o)|| 

= aiW^_„W^) + • • • + n«, Wf) + O 


is an additive functional of the Markov chain {lU^,re > 0}. 


3. A nonlinear Markov renewal theory. Note that {W^,n > 0} defined 
in (2.13) is a Markov chain on a general state space D' x P{R‘^) x P{R‘^). In 
this section, abuse the notation a little bit and let {A„,n > 0} be a Markov 
chain on a general state space X with cr-algebra A, which is irreducible with 
respect to a maximal irreducibility measure on (X,A) and is aperiodic. Let 
Sn = YPk=i Ck be the additive component, taking values on the real line R, 
such that {{Xn,Sn),n>0} is a Markov chain on X x R with transition 
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probability 

P{{Xn+uSn+l) eAx {B + s)\{Xr,,Sn) = {x,s)} 

(3.1) 

= P{(Xi, 5i) G ^ X B\{Xo,So) = {x, 0)} = P{x, A x B), 

for all X G A', A G A and B G B{R) (:= Borel ci-algebra on R). The chain 
{{Xn, Sn),n > 0} is called a Markov random walk. In this section, let P^ {E^) 
denote the probability (expectation) under the initial distribution on Xq 
being v. li v \s degenerate at x, we shall simply write Px (Ex) instead of 
Pu {E,^). We assume throughout this section that there exists a stationary 
probability distribution vr, 'k{A) = / P{x,A) d7r{x) for all Ag A and E^^^i > 
0 . 

Let {Zn = Sn + rin,n > 0} be a perturbed Markov random walk in the 
following sense: Sn is a Markov random walk, rjn is J^n-measurable, where 
Tn is the u-algebra generated by {{X^, Sk),0 < k < n}, and rjn is slowly 
changing, that is, maxi<t<„ Iry^l/n —> 0 in probability. Let {A = A(t; A), A G 
A} be a family of boundary functions for some index set A. Define 

(3.2) T = T\ = inf{n > 1: > A(n; A)}, inf 0 = oo for each A G A. 


It is easy to see that for all A > 0, T\ < oo with probability 1. This sec¬ 
tion concerns the approximations of the distribution of the overshoot and 
expected stopping time E^T as the boundary tends to infinity. 

In the case of independent and identically distributed (i.i.d.) random vari¬ 
ables ^n with common positive mean, nonlinear renewal theory concerning 
boundary crossing times and its applications has been studied by Lai and 
Siegmund (1977, 1979), Woodroofe (1976, 1977) and Zhang (1988), among 
others. A good summary for this topic can be found in Woodroofe (1982) and 
Siegmund (1985) and references therein. For a perturbed Markov random 
walk with Et^^i > 0, Melfi (1992) generalized Lai and Siegmund’s (1977) 
results to study the limiting distribution of the overshoot crossing a con¬ 
stant boundary. A multidimensional nonlinear hrst passage probability for 
perturbed Markov random walks can be found in Fuh and Lai (2001). 

A Markov chain {X„,n > 0} on a state space X is called D-uniformly er- 
godic if there exists a measurable function V: X [1, oo), with / V (x) dTT{x) < oo, 
such that, for any Borel measurable function h on X satisfying \\h\\v ■= 
sup^, |/i(x)|/I7(x) < oo, we have 


lim 


sup<^ 

I 


r \E{h{Xn)\Xo = x) - / h{x) dTr{x)\ 


V{x) 


:xGX,\h\<V\=0 


In this section we shall assume that {Xn,n > 0} is D-uniformly ergodic. 
Under the irreducibility and aperiodicity assumption, U-uniform ergodicity 
implies that there exist r > 0 and 0 < p < 1 such that for all h and n > 1, 


(3.3) 


^\E{h{Xn)\XQ = x)-^h{y)d-K{y)\ ^ „ 

SS V(x) - 
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see pages 382 and 383 of Meyn and Tweedie (1993). When V = 1, this 
reduces to the classical uniform ergodicity condition. 

The following assumptions for Markov chains will be used in this section: 

A2. supj, < oo and sup^ < qq for some r > 1. 

A3. Let be an initial distribution of the Markov chain {A„, n > 0}. Assume 
that for some r > 1 , 


(3.4) 


sup / h(x)E:j;l^ir diy(x) 


< oo. 


A Markov random walk is called lattice with span d > 0 if d is the maximal 
number for which there exists a measurable function 7 : A ^ [0, 00 ) called the 
shift function, such that P{Ci — 'y{x) + 'y{y) £ { ..., — 2 d, —d, 0 , d, 2 d,... }|Aio = 
x,Xi = y} = 1 for almost all x,y ^ X. If no such d exists, the Markov ran¬ 
dom walk is called nonlattice. A lattice random walk whose shift function 7 
is identically 0 is called arithmetic. 

To establish the nonlinear Markov renewal theorem, we shall make use 
of (3.1) in conjunction with the following extension of Cramer’s (strongly 
nonlattice) condition [Gotze and Hipp (1983), (2.5) on page 216]: There 
exists d > 0 such that for all m, n = 1 , 2 ,..., <m <n, and all d G i? with 

1^1 > 5 , 


L;^|£;{exp(i6'(^ 

n—m + • • • + ^n+m))\Xn —m? • ■ • ? 


A^n— 1 ) ) . . . ) A^^-l-m+l} | ^ 6 


-<5 


By using Markov renewal theory [Kesten (1974), Alsmeyer (1994), Fuh 
and Lai (2001) and Fuh (2004)] and Wald’s equations for Markov random 
walks [Fuh and Lai (1998) and Fuh and Zhang (2000)], our approach is 
based on the investigation of the difference between Tx and a stopping time 
crossing linear boundaries with varying drift. That is, we first define 


(3.5) T :=t{c,u) = ini{n>l:Sn —un> c}, c > 0, u<Et^^i, 

and establish the uniform integrability of — T{cx,dx)\^ for p>l, for 
suitable cx and dx. Then we derive nonlinear Markov renewal theory directly 
from parallel results in the linear case via the uniform integrabilities and the 
weak convergence of the overshoot. 

Let P]^{x, B X R) = Px{X.^^Q u'^ £ B} for u < and denote the transi¬ 

tion probability associated with the Markov random walk generated by the 
ascending ladder variable 5'^(o,u)' Under the IZ-uniform ergodicity condition 
and > 0, a similar argument as on pages 655-656 of Fuh and Lai (2001) 
yields that the transition probability P^{x, ■ x R) has an invariant measure 
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7 r“ . Let denote expectation under Xq having the initial distribution 7 r“ . 
When u = we denote as P+, and r+ = T( 0 ,p 7 r^i). Dehne 


(3.6) 6 = = sup{t > 1: ^(t, A) > sup0 = l, 

(3.7) d = dx=(^^yby,\), 

(3.8) J=sup|(^^^(t;A); t>bx, Aea|, 

(3.9) R = Rx = Zj'— A(T', X), 

(3.10) R{c,u) = — ut{c,u) — c, c > 0, 

(3.11) r{u) = ElR^{0,u)/2ElR{0,u), u<E^^i, 

/ OO 

P+{P(0, u) > s} ds/P+P(0, m), u<Et^^i, r > 0. 


We shall assume that A(t; A) is twice differentiable in t and bx is finite 
so that d and d are well defined. The next theorem is a Blackwell-type 
nonlinear Markov renewal theorem. In the case of i.i.d. random variables, 
such a result has been developed by Lai and Siegmund (1977). Melfi (1992) 
has extended their result to the Markov case under a different ergodicity 
assumption as in this paper. Here, we consider a nonlinear boundary, ex¬ 
tending Zhang’s (1988) result under the H-uniform ergodicity assumption. 
Since for 1/2 < a < 1 , 6 “"(T — b) = op^(l) implies (3.13) with 7 ( 6 )/ 6 “ ^ 0, 
Theorem 1 implies Theorem 3 of Melfi (1992). 


Theorem 1. Assume A1 holds, and A2 and A3 hold with r = 1. Let 
V be an initial distribution on Xq. Suppose there exist functions p{5) > 0, 
Vb < 7(6) < b, 7(6) /6 — > 0 as b^ OO, and a constant d* < PjrCi £ (0, 00 ) such 
that 


(3.13) 

(3.14) 

(3.15) 


lim Pjy 

n—^oo 


{Tx-bx)/'y{bx) = Op^{l) as bx 
,^.max \7]n+j -Vn\>d \ =0 




t — b\ < K'y{b), 


OO, 

for any (5 > 0 , 

A e A I < OO 


and 

(3.16) 


lim dx = d*. 


for all K > 0 
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If ii — d* does not have an arithmetic distribution under P^,, then for any 
r > 0 , 


Pu{Xt € B,Rx> r} 

1 f 

(3.17) = (X) I Pf {RiO, P)>s}ds + 0 ( 1 ) 


as bx —> oo. 


In particular, Pu{Rx > r} = G{r, d*) + o(l), as bx^ oo for any r > 0. If, in 
addition, (T — hx)/^{hx) converges in distribution to a random variable W 
as bx ^ oo, then 

(3.18) lim PARx>r, Tx>bx + tj{bx)} = G{r,d*)Pf{W>t}, 

bx^oo 

for every real number t with Pf^{W = t} = 0. 


The proof of Theorem 1 is given in Section 6. 

To study uniform integrabilities of the powers of the differences for linear 
and nonlinear stopping times, we shall first give the regularity conditions 
on r/ = {r]n, > !}• The process r/ is said to be regular with p > 0 and 1/2 < 

a < 1 if there exist a random variable L, a function /(•) and a sequence of 
random variables Un, n > 1, such that 

(3.19) rjn = f{n) + Un for n > L and sup < oo, 

x&X 

(3.20) max \f{n + j) — f{n)\<K, K<oo, 

(3.21) l^max \Un+j\^, n > l| is uniformly integrable, 

(3.22) sup Pxl max Un+j > 6*n“ 1 —> 0 as n ^ oo, for all 9 > 0, 

xex lo<i<n J 


and for some tc > 0 , w < — d if a = 1 , 

OO 

(3.23) y/ sup Px{—Un > wn'^} < oo. 

n=l xGX 

We shall set f{n) to be the median of when rj is not regular and 
extend / to a function on [l,oo) by linear interpolation. Therefore, we can 
define t = tx = T{cx,dx) and ca = bxiE^^^i - dx) - fib\). 


Theorem 2 . Assume Al holds, and A 2 and A3 hold with r = p'(p + 
1)/q: for some p> 1, p' > 1 and 1/2 < a < 1 . Suppose p is regular with 
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p>l, l/2<a<l, and that there exist constants 5 and p* with 0 < <5 < 1 
and 0 < /i* < such that 

(3.24) hPsup Px{T < 6b} —> 0 asb—>-oo, 

X 

and 

(3.25) ^^^(t;A) </i*, t>6b,XeA. 

(i) If supx^x } < oo for some p' > 1 and for any K >0, 


(3.26) 




(t;A) 


■b\- Kbf<t<bx + Kbx, 


A G A 


< oo, 


then 


(3.27) {|T;, — taI^; A G A} is uniformly integrable under P^. 

(ii) Ifd^A/dt^ = 0, then (3.27) still holds without the condition < 

oo. 


The proof of Theorem 2 is given in Section 6 . 

We need the following notation and definitions before stating Theorem 
3. For a given Markov random walk {(X„,S'„),n > 0}, let u be an initial 
distribution of Xq and define £ B) on A. Let g = 

£^(^i|Xo,Wi) and ET^\g\ < oo. Define operators P and by (P 5 ')(a;) = 
Exg{x,XiAi) and P^^^ =respectively, and set g = Pg. We 
shall consider solutions A(x) = A{x‘,g) of the Poisson equation 

(3.28) (/— P)A = (/— Ptt)^, u*-a.s., P 7 rA = 0 , 

where I is the identity operator. Under conditions A1-A4, it is known [The¬ 
orem 17.4.2 of Meyn and Tweedie (1993)] that the solution A of (3.28) exists 
and is bounded. 

Theorem 3. Assume Al holds, and A 2 and A3 hold with r = 2-\-p 
for some p> 1. Let v be an initial distribution such that EyV{XQ) < oo. 
Suppose that 

(3.29) lim sup Px< max \r]n+i — Vj\ >5^ = 0 for any 5 > 0, 

x&X U<j<\/n J 

(3.30) rjn = f{n) + Un for any n>L, 
and that there exist constants d} < E^^^i and such that 

(3.31) lim max |/(n-|-j) —/(u)| = 0, 

n—>oo 0<j<^/n 

{3.32)Un converges in distribution to an integrable random variable U, 

(3.33) lim dx = d} and — d} is nonarithmetic under Pi,, 

bx^oo 
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and for any constant K > 0 


(3.34) lim sup 


bx^oo 



= 0 . 


// {ITa — taI; A G A} is uniformly integrable, then 

(3.35) EyTx = b\-{E.„f,i-dx)~^f{bx) + Co + o{l) asbx^oo, 


where 



(3.36) 



The proof of Theorem 3 is given in Section 6. 

When A{t, A) = A, we have the following: 

Corollary 1 . Under the assumptions of Theorem 3, as A— >00, 



4. Asymptotic optimality of the SRP detection procedure. For ease of 
notation, let X := D' x P{R‘^) x P{R^) be the state space of the Markov 
chain {W^, n > 0} defined in (2.13). Denote w := (y, u, u) and w := {yo,'^, tt), 
where yo = (xo,7r) G D' and xq is the initial state of Xq taken from vr. To 
prove the asymptotic optimality of the SRP rule in hidden Markov models, 
the following condition C will be assumed throughout this paper. 

Cl. For each 0 G 0, the Markov chain X = {X„,n > 0} is ergodic (posi¬ 
tive recurrent, irreducible and aperiodic) on a finite state space D = 
{!,... ,d}. Moreover, the Markov chain {(X„,^„),n > 0} is irreducible, 
aperiodic and P-uniformly ergodic for some V on D' with A1 and A2 
holding. We also assume the Markov chain {X„,n > 0} has stationary 
probability F with probability density iTx{0)f{-‘,ipx{0)) with respect to 
T- 

C2. For each 0 G 0, the random matrices Mq{6) and Mi{9) defined in 
(2.9) and (2.10) are invertible almost surely and 

d 




sup El T^xi 0 )fif,O-,Tx{ 0 ))Pxy{ 0 )f.lf{^l-,^Pyi 9 )\f,o) <00. 
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The construction of the SRP rule and the proof of its asymptotic optimal¬ 
ity can be split into two steps. We first prove that it is a limit of Bayes rules, 
and then we prove the asymptotic optimality. To this end, let us consider the 
Bayesian formulation of change point detection in a hidden Markov model 
and denote it by B{P,p,c,w). That is, we assume the initial state of Wq is 
w and suppose w has a prior distribution 

P^u((J = 0)=/3 and Pu,(a; = n) = (1 —/3)p(l — for n > 1 , 

where p and f3 are known constants with 0<p<l,0</9<l. The parameter 
uj is the (unknown) point of change of the process from a hidden Markov 
model. 

Let be a stopping time adapted to the system of fi-algebras 
where is the natural fi-algebra {0, T'} and Tn = ' 7 (.Po, Wq, Wi ,..., Wn)- 
Following the formulation of Shiryayev (1963, 1978) and its modification 
given by Yakir (1994) for a finite state Markov chain, the risk associated 
with the detection policy N is 

(4.1) p{N,u;) =Fu,{N <uj)+ cEu,{N 

where a~^ denotes max{a, 0 }, and c > 0 is a fixed constant. 

Definition 1. For a given pair {p,w) G (0,1] x Y, we call a stopping 
time N* a B{f3,p,c,w)-Bayes time if 

p{N*,uj) = inf p{N,lo), 

where inf is taken over the class of all proper stopping times. 

The following proposition characterizes the structure of the Bayes rule 
in hidden Markov models. Since the proof of the proposition is similar to 
Shiryayev’s proof, it is omitted. 

Proposition 1. Let0<p<l, oO and let 

dn = dn{p,w) =Prii(w < n\B'n) 

be the posterior probability that the next observation is governed by . 
There exists a function Ap{-), defined on X, such that the stopping time 

(4.2) NA,p = ini{n>0:6n{p,w) >Ap{Wn)] 

is the B{f3,p,c,w)-Bayes rule. Moreover, Ap{-) does not depend on (3 or on 

w. 
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Remark 1. Proposition 1 remains correct when the initial pair {p,w) 
is random (according to a measure (p). Again, the threshold function does 
not depend on the initial state. (Notice that the stopping time does depend 
on the distribution of the initial state through the dependence on the initial 
state of the probability of a change.) The structure of the Bayes rule plays 
a crucial role in the development of the optimal detection time in the non- 
Bayesian setting. 


Denote 

(4.3) r{x) = ^ q = l-p^ 

(1 — x)p 

and let 

(4.4) LRr.,, = r{f3)^ + ±^. 

^ k=o ^ 


It is convenient to reformulate the stopping time in terms of a dif¬ 
ferent sequence of statistics. By using the same idea as Lemma 2 of Poliak 
(1985), it follows that 


(4.5) 


Since the function y/{y + 1/p) is a monotone function in y, the Bayesian 
stopping time can be rewritten in terms of LR^^p, 


(4.6) 


Na,p = inf{n >0:6n{p,w) > Ap{Wn)} 

= inf{n > 0 : LRn,p > Bp{Wn)} = Nb,p, 


where Rp(-) = r{Ap{-)). For consistency of notation, we will use Nb,p instead 
of NA,p in the sequel. 


Theorem 4. Assume Cl and C 2 hold. Suppose that the Poo-distribution 
of LRi is nonarithmetic. 


(i) There exists a 5 > 0 such that for any 6 <b = logB < oo, there exist 
a constant 0 < c* < oo and a sequence {pi,Ci}fTi with pi —> 0, Cj —> c* as 
i —> oo such that the stopping time defined in (2.7) is a limit as i ^ oo 
of Bayes rules for B {/3 = 0,p = pi,c = Ci,w). 

(ii) For any set of Bayes problems B {(3, p, c, w) with j3 = {),p—>Q,c—>c*, 


(4.7) 


lim sup 

p—>0, c—>c* 


1 - Ep{Nb,p,u}) 
1 - Ep{Nb,w) 


= 1 , 


where the expectation is taken in the Bayes problems B{0,p,c,w). 
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(iii) For any 1 < 7 < oo, there exists a unique 1 < 6 = logi? < 00 such 
that 7 = EocNb. 

The proof of Theorem 4 is given in Section 7. 

After understanding the structure of the Bayes rules for detecting a change 
in hidden Markov models and the characteristics of the limits of such rules, 
we can turn our attention to the problem of detecting a change in a non- 
Bayesian setting. To study randomization of the initial for the SRP change 
point detection rule, we need the following notation first. 

For 0 < fe < n, let 

/X o', 1 Pn(Cfc) • • • ) ^ 1 ) 

rin,p •— 2 ^ 77 7 c . a \ ' 

^_Q Q Pniski ^fc+1) • • • ) ^n'l uq) 

Note that = LRn,p when /3 = 0. By using the same notation as that 
in Section 2, for yo,yi G D', u = u{6) = (u(0o),u(^i)) G P(R‘^) x P{R‘^) and 
M = M{yo, yi) = M{9) = (M(0o), M(0i)) G Gl{d, R) x Gl{d, R), let P: [d'x 

p(R‘‘)xP(ir‘))x (O' xP(R‘‘)xP(R-))^R be 0(Sm,a), (yi, MS)) = Kig;i$;i | |;;| | :ig;i | | . 

For rr(9„),,(9,) e P{R'‘), denote f)(ir„,ir„) = S;Sj:ja;w ii :%) ii ■ Then 

Pn (‘^0) 5 • • • ) j ) 

Pn(‘^0) • • • ) ^n'l ^ 0 ) 

_ ||A4(gi)---Mi(gi)Mo(gi)7r(gi)|| 

\\Mn{eo)---Mi{9o)Mo{eo)7r{9o)\\ 

(4.9) ^ l|Tn(gi)7r(gi)||/||T^-i(gi)7r(gi)|| 

||Tn(0o)vr(0o)||/||'ir„-i(0o)vr(0o)|| 

||Ti(gi)7r(gi)||/||To(gi)7r(gi)|| ||To(gi)7r(gi)||/||7r(gi)|| 

||Ti( 0 o) 7 r( 0 o)||/||To( 0 o)^( 0 o)|| ||To(0o)vr(0o)||/||vr(0o)|| 

= /3(1T^1,0 • • • /?«, O • /3«, O 

is a product of the functional for the Markov chain {W^,n > 0}. Therefore, 

(4.8) can be rewritten as 

n 

(4.10) Rn,p = E - • • • mLi.Wt) where = <. 

k=o ^ 

Define 

Rn+i,p = + Rn,p), Ng^h :=mi{n:Rn,p > B}, 

Fn (^S,w') — Poo (Rn+l,p ^ s\d^q,b P Tl, Wn — rc), 
p{t,S,W,w') = Foo{Rn+l,p < S,Wn+l G dw'\Rn,p = t, > n+l,Wn = w), 

C{t,w,w') =Poo(A^g,b > n + l,lTn+l G dw'\Rn^p = t,Ng^f, >n,Wn = w). 
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For a given set of nonnegative boundary points B = {B{w) :w € X} (in¬ 
finity is not excluded), consider the set Sb = {{r, w) :w G X ,0 < r < B{w)}. 
Let Tb be the set of distribution functions with support in Sb- Let Tg be 
the transformation on Xb defined by 


(4.11) 


where 

(4.12) 


1 r rB{w') 

TBF{r,w) = —— / p{t,r,w,w)Cit,w,w) 

tJ L Jw'px Jo 


QiF) 


X F{w, dw') dF{t, w'), 


P pB{w') 

Q{F) = / / (^{t,w,w')F{w,dw') dF{t,w'). 

JJ 0 


The idea behind (4.11) and (4.12) comes from iterated random functions, 
which Poliak (1985) used to define a change point detection rule in the in¬ 
dependent case. Here Fn{s,w) is driven by the Markov chain {Wll,n > 0}, 
and, hence, in the domain of Markovian iterated random functions. Under 
some regularity conditions on the Markov chain {W^,n > 0}, and the con¬ 
tinuity property for the iterated random functions, we will show in Lemma 
8 that for each B there is an associated set of invariant measures 4>g, that 
is, Tg^ = (/) for all 0 G 4>g. Let p = l — q and define (j) as 


d(j){s, w) 


(1 +ps)d(l){s,w) 

^(1 +Pt) d4>{t,w) 


It is easy to see that if the distribution of iio,p is 4>-, then the distribution 
of iio,p conditional on {oj > 0} is 4>. Note that (j) depends on p. By using 
the same argument as that in Theorem 4, we can choose a subsequence 
{Tg,pj, Ci,(j)i} such that as i —> oo,pi —> 0,Cj —> c* and converges in distri¬ 
bution to a limit ijj. 

Given the value of the initial state Wq = w, the initial {Rq,w) is simu¬ 
lated from the distribution ip, conditioned on the event {Wq = w}. Dehne 
recursively 

(4.13) K+i = + K)- 


Denote b = logB, and define the SRP rule 

(4.14) := inf{n: R* > R} = inf{n: logR* > 5}. 


Notice that each one of these detection policies is an “equalizer rule” in 
the sense that 

(4.15) Ek{N^ -k + 1\N^ >k-l)= 

for all k > 1. The same is true for the case where ip has atoms on the 
boundary, since the randomization law is time independent. 
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Note that the threshold of the Bayes rule (4.6) depends on the current 
state of the Markov chain, while the threshold of the SRP rule (4.14) is a 
constant. We claim in Lemma 7 and Theorem 5 that the difference between 
these two rules is o(l) as 7 —> 00 , by which we prove the conjecture raised 
by Yakir (1994) for finite state Markov chains. 


Theorem 5. Assume Cl and C2 hold. Suppose that the Poo- distribution 
of LRi is nonarithmetic. Then for any 1 < 7 < 00 , there exist a constant 
6 < b = logB < 00 and a probability measure ip such that 7 = EooN^ and 
such that if N is any stopping time which satisfies EooN > 7 , then 

( 4 . 16 ) sup E^{N — u;\N > u) > sup E^{N^ — ijj\N^ > uj) o{l), 

l<a;<oo l<a;<oo 

where o(l) —> 0 as 'y —>■ 00 , E^^{Nf — oj\Nf > 00 ) is a constant for 1 < a; < 00 . 


The proof of Theorem 5 is given in Section 7. 


5. Asymptotic approximations for the average run length. Since Njj’ is 
an equalizer rule in the sense of — k l\N^ > k — 1) = KiN^ by 

(4.15), in this section we consider only the approximation of EiA^^. For 
9 = 6^ or 6^, let vr^ denote the stationary distribution of {X„,n > 0 } under 
. For given and , define the Kullback-Leibler information numbers 
as (4.2) of Fuh (2003), 


(5.1) 


A:(P®o,P®i) =Epeo 
7L(P®i,P®°) =Epfli 


/ ||Mi( 0 o)Mo( 0 o)vr®‘’ 

r^||Mi(0i)Mo(0i)7r^i 

/ ||Mi(0i)Mo(0i)7r®i 

||Mi( 0 o)Mo( 0 o)vr^“ 


where P®° (P^^) denotes the probability of the Markov chain {W,^°,n > 0} 
({^nS''^ ^ 0 })) EpSQ (Ep 0 ^) refers to the expectation under P®° (P^i). 

In the rest of this section we will impose the following mild condition on 
the Kullback-Leibler information numbers: 


(5.2) 0< A:(P^°,P®i) <00 and 0 < A:(P^i,P^°) < 00 . 


To derive a second-order approximation for the average run length of 
the SRP rule, we will apply relevant results from nonlinear Markov renewal 
theory developed in Section 3. For this purpose, we rewrite the stopping 
time Njj := (we delete ip in this section for simplicity) in the form of a 
Markov random walk crossing a constant threshold plus a nonlinear term 
that is slowly changing. Note that the stopping time Ni, can be written in 
the form 


(5.3) 


Nh = inf{n > 1: S„ + ??„ > 6 }, 


b = log B 
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where is a Markov random walk defined in (2.15) with mean ]Ei§i = 
iC(P^i,P^o), and 

{ n—1 
1 + ^ 
k=l 

For b > 0 , define 

(5.5) = inf{n > 1: > 6 }, 

and let Rt = Sn* — ^ (on < 00 }) denote the overshoot of the statistic 
crossing the threshold b at time n = . When 6 = 0, we denote in 

(5.5) as N^. For given w £ X, let 

(5.6) G{y) = lim Fi{Rb <y\Wo = w} 

0^00 

be the limiting distribution of the overshoot. It is known [cf. Theorem 1 of 
Fuh (2004)] that 

POO 

lim Ei(i?b|Wo = w)= ydG{y) = -- 

b^oo Jo 2JEm+JN* 

where is defined in the same way as 7 r_|_ defined in the paragraph before 

(3.6) in Section 3. 

Note that by (5.3), 

b - VNt + Xb on {Nb< 00}, 

where Xb = + "nN^ — b is the overshoot of §n + Vn crossing the boundary 

6 at time Nb- Taking the expectations on both sides, and applying Wald’s 
identity for products of Markovian random matrices [Theorem 2 of Fuh 
(2003)], we obtain 

K{F^\r^°)Ki(Nb\Wo = w)- f A(w')dm+(w') + A{w) 

(5.7) Jv 

= Ei(§ArJ Wo = w) = b- Ei(ry7vjWo = w)+ Ei(xbl Wo = w), 

where A : T —> i? solves the Poisson equation 

(5.8) E^„A(Wl) - A(r(;) =Eu,§i-Em§i 

for almost every w £ X with EmA(Wi) = 0. 

The crucial observations are that the sequence {r^n, n > 1} is slowly chang¬ 
ing, and that rjn converges Pi-a.s. as n ^ 00 to the random variable 
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with finite expectation Here the expectation is taken under 

u; = 1 and the initial distribution of Wq is m+; we omit 1 for simplicity. An 
important consequence of the slowly changing property is that, under mild 
conditions, the limiting distribution of the overshoot of a Markov random 
walk over a fixed threshold does not change by the addition of a slowly 
changing nonlinear term (see Theorem 1). 

Theorem 6 . Assume Cl and C2 hold. Let ■ ■ ■ ,^n ® sequence 

of random variables from a hidden Markov model {f,n-,n > 0}. Assume that 
§1 is nonarithmetic with respect to Poo and Pi. If 0 < < oo, 

0 < A:(P^o,P^i) < oo, and Ei|§ip < oo, then for w ^ X, as oo, 

Ei{Nb\Wo = w) 

1 

(5,10) ■ 

/ Em+'S'jY* r \ 

xi^b- Em+V + ^^(w) dm+{w) + A(u>) j + o(l). 

The proof of Theorem 7 is given in Section 8 . 

Remark 2. The constants /2Em+'S'Ar| and E^+f? are the sub¬ 

ject of the nonlinear renewal theory. The constant A{w) dmj^ (w) -|- A(t(;) 
is due to Markovian dependence via the Poisson equation (5.8). Obviously, 
this bound is asymptotically accurate when Ar(P^i,P^°) ^ 0 . 


6. Proofs of Theorems 1-3. We will use the same notation as that in 
Section 3 unless specifically mentioned. 


Proof of Theorem 1. To prove Theorem 1, we can make use of The¬ 
orem 3.1 for the one-dimensional case in Fuh and Lai (2001) as in the case 
of i.i.d. fn [see Theorem 1 of Zhang (1988)]. The details are omitted. □ 

To prove Theorem 2, we need some lemmas hrst. 


Lemma 1. Let v be an initial distribution such that E^V{Xq) < oo. Let 
t{c,u) be defined by (3.5), and let p and a be two constants with p>l and 
l/2<a<l. 

(i) If /°‘V{Xq)) < oo for some p' > 1, then 


( 6 . 1 ) 


^Pi/|max — jj > 7 n“l < 


oo 


for any 7 > 0 . 
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(ii) If V(Xq)) < oo for some p' > 1, then for any K >{), 

{((r(c, li) — (1 — u)~^ c)^ / cY \ c > 1, K~^ — u< K} 

is uniformly integrahle under P^. 

Proof. By Theorem 16.0.1 of Meyn and Tweedie (1993), the P-uniform 
ergodicity condition is equivalent to the fact that there exist an extended 
real-valued function w: X ^ [1, oo), a measurable set C and constants 7 > 0, 
b < 00 , such that 

/ w{y)P{x, dy) — w{x) < —'yw{x) + blc{x) for x ^ X, 

Jx 

where w is equivalent to V in the sense that for some c > 1, c~^V < w < cV. 
Denote Ai = A as defined in (3.28). Let g = E{d\\XQ,Xi) and A 2 {x;g) = 
A{x]g). Since f V(x) dTr{x) < 00, A2 implies that there exists 0 < c < 00 
such that for all x G X, Li(|^ip|Ao = x) < cV(x). By Theorem 17.4.2 of 
Meyn and Tweedie (1993), the solution A^ satisfies A^ < Rj-iV{x) + 1) for 
r = 1,2. This implies that E^\Ar{Xi] g)\ < RrSupiE^(y{Xi) -|- 1) < 00 
for r = 1,2. Therefore, the conditions of Theorem 2 of Fuh and Zhang (2000) 
hold, and, hence, the quick convergence (i) follows from Theorem 2 of Fuh 
and Zhang (2000). 

(ii) The proof of (ii) can be derived from (i) easily. □ 

Following Lemmas 2 and 3 in Zhang (1988), we have the following: 

Lemma 2. Suppose that y is regular with p>l and 1/2 < a < 1 and that 
conditions (3.24) and (3.25) hold. //£'j,(|^i|p'(^’+^)/"P(Xo)) < 00 for some 
p' > 1, then 


lim lfPu{Tx <b — 76 "} = 0 for any 7 > 0 . 

6—^■co 


Lemma 3. Suppose that rj is regular with p>l and 1/2 < a < 1 and that 
condition (3.25) holds. Denote n* = [b + Kb'^]. If E^{\ii\P'{X q)) < 00 
for some p' > 1, then there exists a constant K > 0 such that 

OO 

lim nP~^Pi,{Tx > n} = 0 . 

b—^oo ^ 

n=n* 


Lemma 4. Under the conditions of Theorem 2(i), 

{((T;v — tYY'i is uniformly integrable under Py. 
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Proof. For ease of notation, let T = Tx, ni = [b — 76 "], n* = [6 + 
Kb'^], T'= max(ni,min(T,n*)) and r'= niax(ni,min(r,n*)). By Lemmas 
l(i), 2 and 3, 

(6.2) lim E^\T - T'\p = lim E^\t - t\p = 0. 

6—^■oo b^oo 

Let 7 ' = (1 — ^*)/5, where is defined in Theorem 2. By (6.5) in Zhang 
(1988), there exists a constant K* < 00 , such that 

CXD 

^ nP-^P^{T' >t' + n} 

n=nQ 

00 00 

< ^ nF~^Pu{Sr + n- Sr+n>l'n}+ ^ 'nP~^Pty 

n=nQ n=no 

00 

+ ^ nP-^P^{K*{T-bf/b>-f^n} 

n=no 
00 

+ ^ nP~^Piy{K*\T — b\ni^^‘^ >'j^n} + o{l). 

n=no 

It follows from Lemma l(i), (ii), (3.21) and the condition £'j,{|^ipPP'F(Xo)} < 
00 for some p' > 1 that 

00 

n^“^Pj,{T > r'+ re} ^ 0 as min(reo, 6 ) ^ 00 . 

n=no 

This proves the uniform integrability of {{T — t)~^p} since the uniform inte- 
grability of {T^; 6 < 5*, A G A} for any given b* is implied by (6.2) in Zhang 
(1988). □ 

Proof of Theorem 2. (i) For ease of notation, let T = Tx, ni = [b — 

76 “], re 2 = [& + 7 &“], T'= max(rei,min(r,re 2 )) and P = max(rei,min(T,re 2 )). 
Though re 2 is different from re* in Lemma 3, (T —T')"*" < (T —r)^ + (r —re 2 )^, 
and by Lemma 4 we have 

(6.3) lim E^\T - T'\p = lim E^\t - t'\p = 0. 

6—^00 6—^00 

Clearly, 

P 4 T'>r' + re} 

< Py{L > rei} + P,y{T < re-ij + Py{T > re 2 } 

+ Pu{L <rei<T<T + re<r< re 2 }. 


< max U, ^ > j'n 

[^ni<j<n* 


(6.4) 


OPTIMAL DETECTION IN HIDDEN MARKOV MODELS 


23 


By (3.19) and Lemma 2, 

oo n2—n\ 

Y, nP~^P^{T' >t' + n}= Y nP-^Py{T' >t' + n} 
n=no n=no 

(6.5) 

n2-ni 

— X! rvP~^Py{L <ni <T <T+ n <T <n 2 } + o{l). 

n=nQ 

On the event {L<ni<T<T + n<T< 712 }, 

Sr+n + f{b) <b + d{T + n-h) 

(6.6) 

< ii*n + {h + d{T -b)- A{T- A)) + St + Ut + f{T), 

and by an argument similar to (6.10) of Zhang (1988), there exists a finite 
constant K* that does not depend on 7 , and 6* = 'yK*b°‘~^ + K*b~^^‘^, and 
by (6.5), 

(6.7) St+u - St < ^J*n + UT + 5 *\t' - T'\ + K*{t - hf /b + K*\t - b\b-^/‘^. 
Therefore, it follows from (6.5) and (6.7) that for 7 ' = (1 — /r*)/5, 

CXD 

^ rP-^Py{T' >T' + n} 

n=no 

CXD CXD 

< Y rP~^Py{ST+ n-ST+n>in} + Y 'n^~^Pu 

n=no n=no 

00 

+ ^ nP-^P^{K*{T-hf/b + K*\T-b\n-^/'^>-i'n} 

n=no 

00 

+ Y, nP~^Py{6*{T'— T') >6'n} + o{l) as min(no, 6 ) ^ 00 . 

n=no 

Since 7 is arbitrary, we can choose 7 small enough such that {46* < 2 . 

Hence, it follows from ( 6 . 8 ), Lemma l(i), (ii), (3.21) and Lemma 4 that as 
min(no, 6 ) —> 00 , 

OO 00 

(6.9) Y nP-^PuW -T' >n}< Y - T') > 7 n} + o(l). 

n=nQ n=no 

And by (6.3) and Lemma 4, J2’?^=no n^~^Pu{T' — T'>n} = o(l), and 

{|T — r|^; A G A} is uniformly integrable. 

(ii) For the case where d‘^A/dt^ = 0, the term (r — 6)“^/b disappears 
throughout the proof of (i). □ 


< max Uj > 'y'n 
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Proof of Theorem 3. By definition, St = R + T(T ■,X) — r]T and Sr = 
R{cx,d) + + d{T — b) — f{b). It follows that 

St-S r- d{T - t) 

(6.10) 

= R- R{cx,d) + [A{T-, X)-fib- d{T - b)] - [r,T - f{b)]. 

Recall A defined in (3.28). Let di = — // + A{xi;g) — A{xo]g), g = 

E{di\XQ, Xi), = Erd\ and A 2 {x;g) =A{x;g). Since 

St — fJ-T — Sr + fiT 

= [('S'max(r,r) “ niax(r, r)) - {Sr - fir)] 

- [{Sr-fir) - -//min(r,r))], 

and A1-A3 imply that the conditions in Fuh and Zhang (2000) hold as shown 
in Lemma 1, it follows from Markov Wald’s equation for Markov chains for 
second moments in Corollary 1 of Fuh and Zhang (2000) that 

E,{ST-Sr-fl{T-T)f 

= cr^[£'y(max(r,T) - r) + E^{t - min(r,T))] 

- 2EA{St -Sr- fi{T - T))A{X\T-r\)} 

+ E,{A2{X\T-r\) - ^2{Xo)] 

= {a‘^-2fi)E^\T-T\+0{l). 

Therefore, by Theorem 2, St — Sr — d{T — r) is uniformly integrable. By 
Anscombe’s type central limit theorem for a Harris recurrent Markov ran¬ 
dom walk [Theorem 1 of Malinovskii (1986)], r — b/x/b ^ {fi — dl)aN{0, 1) 
in distribution. It follows that, as 6 —> oo, 

rj-i 

(6.11) — -j= ->(T*A(0,1) in distribution by (3.27), 

Vb 

(6.12) lim Px{R> r} = limPx{R{c,d) > r} = G{r,d*) by Theorem 1, 

6—»'Oo 

A{T-, X)-fib- d{T -b)^ d*2{a*N{0, l)f/2 

(6.13) 

in distribution by (3.34), 

(6.14) gT — f{b) —> 1/ in distribution by (3.14), (3.31), (3.32), 

where c = c\ = {fi — d\)b\ — f{bx) and a* = {g, — dl)~^a. 

By an argument similar to Theorem 3(i) of Fuh and Lai (1998), R{cx,dx) 
is uniformly integrable. Hence, 

Eu{St -Sr- d{T - r)) = {g- d\)-‘^(j'^d\l2 - E^U + o(l), 

E^T = E^t + {g- dl)-^a‘^dH2 - {g - d^-^E^U + o(l), 
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and 

E^t = ifJ-- d)~^c+ {fi - d*)~^ (^r{d*) - J A{x)d{7r'^ (x) - + o(l) 

= bx-{n-d)~^f{bx) 

+ {H- dl)-^ - J A(x) d(7rf (x) - i/(x))^ + o(l). 

This completes the proof. □ 

7. Proofs of Theorems 4 and 5. To establish asymptotic optimality of 
the SRP rule and derive the second-order asymptotic approximation for 
the average rnn length, we need to apply nonlinear Markov renewal the¬ 
ory developed in Section 3. Note that the Markov chain {W®,re > 0} on 
X := D' X P{R'^) X P{R^) is induced by the products of random matrices 
{Mn,n > 0}. A positivity hypothesis on the matrices in the support of the 
Markov chain leads to contraction properties, on which basis the spectral 
theory is developed in Fuh (2003). Another natnral hypothesis is that the 
transition probability possesses a density. This leads to a classical situation 
in the context of the so-called “Doeblin condition” for Markov chains. It 
also leads to precise results of the limiting theory and has been nsed to 
prove a nonlinear renewal theory in Section 3. We summarize the behavior 
of {W^,n > 0} in the following proposition. Note that in the case of i.i.d. 
iterated random fnnctions satisfying Lipschitz conditions, similar results can 
be found in Theorems 2.1, 2.2 and Corollary 2.3 of Alsmeyer (2003). Here 
we generalize it to Markovian products of random matrices. 

Proposition 2. Consider a given hidden Markov chain as in (2.1) 
and (2.2) satisfying Cl and C2, and let 6 = {9o,9i) £0x0 be the pa¬ 
rameters. Then the induced Markov chain {W®,n > 0} defined in (2.13) 
is an aperiodic, p,-irreducible and Harris recurrent Markov chain. More¬ 
over, it is also a V-uniformly ergodic Markov chain for some V on X. And 
we have sup^{E^(l/(iyi))/H(t(;)} < oo, and there exist a,C > 0 such that 
Eu,(exp{ax(Mi)}) < C for all w = (y, u, u) G X. 

Proof. For simplicity of notation, we delete 9 in {lF,^,n > 0} in the 
proof. First, we prove that {Wn,n > 0} is Harris recurrent. Note that the 
transition probability kernel of the Markov chain {(A„,(^„),n > 0} defined 
in (2.2) has probability density function, and the random matrices defined in 
(2.9) and (2.10) also have probability density with respect to y. Therefore, 
there exists a measurable function g:X x X —> [0,oo) such that 

(7.1) P{w,dw') = g{w,w')dp,{w'), 
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where g{'w,w') dfi{'w') > 0 for all w G X. For an arbitrary stopping time 
r = h{Wn) for Wn, let P^(n;, •) := P^(VF^ £ •) for wGX.For Ag B{V') and 
B G B{p{r^) X P{R^)), define 

gP{AxB)-.= [ F{Wr{w')GAxB}dfx{w'). 

Jx 

Then 

T X i?) = [ F'^{w',Ax B)g{w,w')dfi{w') 

Jx 

= / F{Wr{w') G A X B}g{w,w') dg,{w') 

Jx 

for all A G B{'D') and B G B{P{R'^) x P{R‘^)). Therefore, given any P^- 
a.s. finite stopping time r for {Wn,n > 0}, the family (P'^+^(r(;, is 

nonsingular with respect to . 

We have thus particularly shown that, if P has a probability density with 
respect to fi, then P” has a probability density with respect to g, for all n > 1 
(with, in general, different g). Let gr be such that 

(7.2) F'^~^^{w,dw') = gr{w,w') dg'^ {w'), wGX, 

where gr{w,w') dg'^{w') > 0 for all w G X. It is easy to check that all 
g and g'^ are absolutely continuous with respect to m. 

Next, under condition Cl, for each m-positive ^ x i? let 

ro(A X B):={w G X :P^{II4, G Ax B i.o.} = 1} 

satisfy m(ro(A x B)) = 1 and, thus, also P(r(;,ro(^ x B)) = 1 for m-almost 
all w G X. Recursively, define 

Tn+i{A X B) := {tc E r„(A x R):P(rt;,r„(^ x B)) = 1} 

for n > 0. Then m{Tn{A x B)) = 1 for all n > 0 and Tn{A x B) [ roo(^ x 
B) := nfc>orfc(^ X B), as oo, giving m(roo(^ x B)) = 1. Furthermore, 
roo(^ X B) is absorbing because, by construction, P(t(;, Tn{A x B)) = 1 for all 
w G roo(^ X B) and n > 0, and, thus, F{w, roo(^ x B)) = lim„^oo F{w, Tn{A x 
B)) = 1 for all w G roo(^ x B). 

In particular, put r = 1. Denote B^ as the complement of B. Since m(roo(T)'^) = 0, 
also g{TaoiXy) = 0. It is now obvious from the previous considerations that 
we can choose (5 > 0 sufficiently small such that 


ir^{X) JxJr^{x) 


^{g> 5 }{wi,W 2 )l{g>s}{w 2 ,W 3 ) dg{w 3 ) dg{w 2 ) dm{wi) > 0. 


Hence, by Lemma 4.3 of Niemi and Nummelin (1986), there exist an m- 
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positive set Fi C roo('F’) and a /i-positive set r 2 C roo(d:’) such that 
a:= inf ^ X: g{wi,W 2 ) > S, g{w 2 ,W 3 ) > 6} > 0. 

ioisri,io3sr2 

A combination of the above result with (7.1) and (7.2) implies 

F^{wi,A X B) = [ F{w2,AxB)¥‘^{wi,dw2) 

Jx 

(7.3) > / g2{wi,W2) / g{w2,W3)dg,{w3)dn{w2) 

Jx JAxBnr2 

> a5‘^g{A x Br\ r 2 ) 

for all tci G Fi and A x B € B{X). By defining El := Foo(A’i), we obtain an 
absorbing set such that Fi is a regeneration set for {Wn,n > 0} restricted 
to H, that is, Fi is recurrent and satisfies a minorization condition, namely 

(7.3) . This proves the Harris recurrence of {Wn, n > 0} on H. By the previous 
construction, it is easy to see that El = A. Since {Wn-iTi > 0} possesses a 
stationary distribution, it is clearly positive Harris recurrent. 

Next, we give the proof of aperiodicity. If {Wn,n > 0} were g-periodic 
with cyclic classes Fi,... ,Fq, say, then the g-skeleton {Wng)n>o would have 
stationary distributions for k = 1,... ,q. On the other hand, Yn is 

aperiodic by definition and Tnq{9)u is also a product of random matrices 
satisfying condition Cl and thus possesses only one stationary distribution. 
Consequently, <7 = 1 and {Wn,n > 0} is aperiodic. 

Note that we have € A x B i.o.} = 1 for all w € X and all m- 

positive open A x B € B{X). Denote p{B) as the first return time to B for 
Wn- Hence, m(int(A’)) > 0 ensures Fw{p{X) < oo) = 1 for all w G X, which 
easily yields the /r-irreducibility of {Wn-,n > 0}. 

Under conditions Cl and C2, the property of U-uniform ergodicity is 
taken from Lemma 4 of Fuh (2003). The finiteness A1-A3 of the moments 
comes from C2 and a simple calculation. The details are omitted. □ 

To prove the main results in Section 4, our first aim is to find a sequence 
of p’s that converge to 0, for which the stopping times Ns^p converge to an 
appropriate stopping time. Furthermore, for technical reasons, we want all 
the stopping times in the sequence to be bounded by some stopping time 
with finite expectation. 

Lemma 5. Consider the problem B{P = 0,p,c,w) described in Section 
4. Then the following hold: 

(i) There exist a eonstant and some 0 < go < 1, such that for all 
<?o < 9 < 1 and for all threshold functions B{-) with the property that B{w) > 
Dc, for eaeh w G X, we have 
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(7.4) ^.^{Nb^p-u;\Nb,p>uj)>2c 
where Nb,p is defined as in (4.6). 

(ii) Define Nbi,p^w = LRn^p'> D, Wn = w}. Then for each 0 < p < 

1 — qo, there exists w = w{p) such that with probability 1, 

(7.5) Nb,p < Nd,p,w < Nd,i,w- 

Furthermore, there exist a state wi ^ X and a subsequence of p's, such that 

(7.5) is true with w{p) = for all the p’s in the subsequence. 

(iii) Let Bp{-) he the threshold function of the problem B{P,p,c,w), and 
assume that Bp(-) — Bq{-) for some function Bq{-). Assume further that 
the convergence is along the subsequence of p’s from (ii). Then Bq{wi) < Dc- 

(iv) Denote D* = mf{Dc\Dc as in (i)}. Then D* is nonincreasing in c 
and D*^BQ{wi)/qo asc—>oo. 

Proof. Since the induced Markov chain { W ^, n > 0} is Harris recurrent 
via Proposition 2, we may assume, without loss of generality, that there 
exists a recurrent state wq of the Markov chain governed by . By making 
use of the regeneration scheme for Harris recurrent Markov chains, the proof 
of Lemma 5 is similar to that of Lemma 1 in Yakir (1994). The details are 
omitted. □ 


Remark 3. Notice first that the constant Dc does not depend on the 
initial state (p,w). Lemma 5 remains true when c = c(p) is allowed to vary 
with p, as long as liminfp^o c(p) > 0. In particular, it is correct if c(p) con¬ 
verges to some positive c. 

Let (1 — p{N,iy))/p be the normalized risk of a stopping time N. Using 
the results of Lemma 5, we can show that for p —> 0, the (normalized) risk 
of a converging sequence of stopping times goes to a limit. Consider the 
Bayesian problem B{P = 0,p,c,w), and let be a stopping time. A similar 
argument to that of Lemma 9 in Poliak (1985) implies that as p —> 0, 


(7.6) 


PAN>co) 

P 


E^N. 


Lemma 6. Let Bp{-) be defined as in (4.6) and letefi-) = liminfp^o7?p(-)/p. 
Then with probability one, liminfc^oec(‘) = oo. 


Proof. Let Nb,p be defined as in (4.6). Suppose for all w G X, liminfc^o s-c{w) 
eoo('R’) < oo. Then for almost all w G X, Bpfiw)/{pi{l — Bpfiw)) < l-l-eoo(R’) 
for some subsequence Cj —> 0,pi —> 0 as i ^ oo. Since 

£' 7 r(Loss using Nb,p) 

= P-k{Xb,p < w ) + cPt,{Nb,p > uj)Et,{Nb,p - wi\Nb,p > co), 



OPTIMAL DETECTION IN HIDDEN MARKOV MODELS 


29 


it follows from (7.6) that 


1 — il^ 7 r(Loss using Ns^p) 


Pi 

Ptt{Nb,p > Ui) 
Pi 


(1 - CiE^{NB,p - u}\Nb,p > a;)) 


< 


Ptt{Nb,p > w) ^ ^ ^ 

- S - S t + T/oo^Vi+ej, 

Pi Pi 


for large enough i. Clearly, £'ooA^i+eoo < oo. Hence, one can do better by 
using a CUSUM rule in the hidden Markov model with large enough upper 
boundary [Fuh (2003)], and this contradicts the fact that iVs,p is a Bayes 
rule. □ 


Lemma 7. Let Nf he defined as in (4.14), and Nb,p he defined as in 
(4.6). Assume the boundary B{w) defined in (4.6) is chosen as Bg{w) for a 
measurable function g with f^ g{w) dm{w) < oo. Then Ei(A^s,p|Ho = w) = 
Ei(iV^|iyo = w) + o(l) as p —> 0 and B —> oo. 


The proof of Lemma 7 is given in Section 8. 


Proof of Theorem 4. By using Proposition 2 and Lemmas 5-7, the 
proof of Theorem 4 is similar to that of Theorem 1 in Poliak (1985). The 
details are omitted. □ 


Lemma 8. For each B, let Tb he defined as in (4.11) and (4.12). Then 
TBFn = Fn+i, and, hence, there associates a set of invariant measures ^b 
such that Tb4> = 4> for all cp & ^b- 


Proof. Since 

Poo(Pn,p G ds,Wn+l G dw'\Nq^b > n+l,Wn = w) 

= Poo(i?n,p G ds,Wn+i G dw'\Nq^b > n,Nq^b >n+l,Wn = w) 
= Poo(iVq,6 > n + l,Wn+l G dw \Rn,p = t, Nq^b > n,Wn = w) 


r rBiw') 

/ / ¥oo{Nq^b>n+l,Wn+l^dw\Rn,p = t, 

Jw,w'gX jo 


/w,w'£X Jo 


Nq,b >n,Wn = w) 

Poo(Pn,p G ds\Nq^b >n,Wn = w)Poo(iVg,fc > n\Wn = w)¥{w,dw') 
Poo(7?n,p G ds\Nq^b >n,Wn = w)¥oo{Nq^b > n\Wn = w)¥{w, dw') 
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C{s, w, w')F{w, dw') dFn{s, w) 

Iw,w'&x ^ C(s, W, w')F{w, dw') dFn{s, w) 
it follows that 

Iw'ex ^ w')Cit, w, w')¥{w, dw') dFn{t, w) 


Fn+l{s,w) = 


Q{Fn 


= TBFn{s,w). 


The existence of the hxed point follows by the same argument as that of 
Lemma 11 in Poliak (1985). □ 


For a given 7 , let A/ly be the set of all detection policies , defined as 
in (4.14), for which EoofV^ = 7- In the next lemma it is shown that A/ly is 
not empty. Furthermore, this set contains a stopping rule that is a limit of 
Bayes stopping rules. 


Lemma 9. There exist a sequenee of p's that converges to 0, a se¬ 
quence of randomized Bayes problems B{[3 = 0,p,c{p),'ipp) with the appro¬ 
priate Bayes rules N^l, defined as the detection policy Nf in (4.14) with 
Rn replaced by Rq^n, cmd a constant 0 < c < 00 such that 

(i) c{p) -^p^o c, 

(ii) EooiV,^ = 7, 

(hi) pPr {N{P = 0,p, c{p),'4}p)) -^p^Q {piip) + 7)(1 - cEiA^^^), 

where p,{'ip) = f^^^ f^ rd'ip{r,w), and p'Pp{N{P = 0,p,c{p),'ijjp)) is the nor¬ 
malized Bayes risk. 


Proof. The proof is similar to that of Lemma 2 in Yakir (1994) and is 
omitted. □ 


By Lemma 7, the difference of the expected values for the stopping rule 
with constant boundary and the stopping rule with curved boundary is 
0 ( 1 ). Therefore, we only need to consider the stopping rule with a constant 
boundary in the following lemmas. Note that in this case, 'ip{s,w) = V’(s)- 


Lemma 10. Let 


m{k) = < 


/(f sdtpis) 

f/l^sdPis) + E^Nf' 
FoojN^ > k) 
fQ^sdPis) + E^Nf 


k = 0, 


k = l,2,.... 
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If one uses in the problem B{j3 = 0,pi,c{pi),'ipp^), then P(a; = > 

Ll>) — > m(k) as i — > oo. Also, if one uses Nf instead of in problem 

B{fl = 0,pi,c{pi),'ifp^), then ¥{uj = k\N^ > uj) ^ m{k) asi—>oo. 


Proof. Note that P(cj = 0) /pi —> /q^ s d'ijj{s) as z ^ oo. If one uses 
in problem B{P = D,pi,c{pi),ifpf), then by using an argument similar to the 
proof of Lemmas 9 and 12 of Poliak (1985), we have 


(7.7) 

Therefore, 


Pi 


rB 


%—>oo Jo 


ri’Pi 


sd'ijj{s) +EooiV^ 


^ P(N'7^ > k\uj = A:)P(a; = k) 

{uj = klNf^f, > uj) = ^ ^ ^ m{k). 


A similar argument applies when using instead of 


□ 


By using the same argument as in Lemma 13 of Poliak (1985), we also 
have 

1 — {Expected loss using for B{fl = 0,pj,c(pi), 


lim 

1^00 


Pi 


1 - {Expected loss using for B{I3 = 0,pi, c(pi), 


(7.8) 


= lim 

i—>oo 


X 


Qi 

Pi 


rB 


UO 
X 11 - C* 


sd'tp{s) +EooN^ 




fo^sdfj{s) + Eo,N^ 


ff sdipjs) 


+ 


/(f sdzA(s)+EooA,^i-oo 


^ lim Eoo{N^\uj = 0) 

^ 7,^00 


The following lemma generalizes Theorem 5 of Kesten (1973) from prod¬ 
ucts of i.i.d. random matrices to products of Markov random matrices. 


Lemma 11. Let n>0 be the random matrices defined in (2.8) and (2.9). 
Assume t/iat E^ log | Mi | < 0, but that for some ki > 0, Em|Mi|^i = 1, Em|Mi|^i x 
log^|Mi| <oo. Assume, in addition, log|Mi| does not have a lattice distri¬ 
bution. Then the series R = ''' MiMq converges with probability 1, 
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and 

lim > t) and lim < —t) 

t—^CX) t^OO 

exist and are finite. 

Proof. By making use of the result that for all w € X, 

BP.u,|max \\Mn ■ ■ ■ MiMqtt\\ >i?|—>C asB—> oo, 

for some constant C, developed in Section 3.2 of Fuh (2004) for products of 
Markov random matrices, the proof of the remainder part is similar to that 
of Theorem 4 and Theorem 5 in Kesten (1973). The details are omitted. □ 

Lemma 12. sd^p{s)/[jQ sdfiis)+E^Nf] = 0{{\ogB)/B), where 0{{logB)/B)/{{log B)/B) 
remains bounded as B ^ oo. 


Proof. Denote by Tn the n-algebra generated by • • • ,Cn}- Since 

Rn+i = l3{W^_i,W^){l + i?*), it follows that Eoo{Rn+i\R'n) = 1 + Rn, and, 
therefore, 72* — n is a Poo-martingale with expectation Eoo {Rn — 'n)= Ecxd72q = 

/(f sdfi{.s). 

By using the optional sampling theorem, we have that /g“ s dfi{s) = — 

E^Nt. Therefore, /o°° s dfi{s) + EoofV^^ = EooT?^, > B. 

It is easy to see that for all n, 


(7.9) 


fi{s)=¥M<s\N^>n) 

>Poo(7?*<s) ^ lim Poo(7?n < s) = P(7? < s). 

n—>oo n—>oo 


Note that the limit in the above equation (7.9) follows from 7?* — Rn = 
7?q exp{X)r=i ^ 0 a.s. Poo as ra ^ oo. Hence, 





rB 

ds< P(P > s) ds. 
Jo 


Under the conditions of Lemma 10, this implies the conditions of Lemma 11 
hold with ki = 1. And by Lemma 11, sP(P > s) —> 1 as s —> oo. 

It follows that hmB_>oo/o s d'il){s)/log B < 1, from which Lemma 12 fol¬ 
lows. 

□ 


Proof of Theorem 5. Let Nf be a stopping time from the set Afg 
that minimizes E^A^ among all stopping times N from that set. The change 
point detection policy is a minimax policy in the sense of equations 
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(2.3) and (2.4). Notice that a limit of Bayes rules minimizes lE^A^ among all 
stopping times in the set A/)g, hence the claim of the above theorem is not 
empty. By (4.15), is an equalizer rule and note that 


< Nh < min 


n 


^ 

i<fc<npfc(4o,a) • • • )Sfc;po) 


which is the CUSUM stopping rule for hidden Markov models. By Theorem 
7 of Fuh (2003), we have = 0{logB). The rest of the proof is almost 

identical to the proof of Theorem 2 in Poliak (1985) and is thus omitted. 


□ 


8. Proofs of Theorem 6 and Lemma 7. 

Proof of Theorem 6. Note that the probability Pi and expectation 
El in this section are taken under Wq = w, and we omit it for simplicity. The 
proof of (5.10) rests on the nonlinear Markov renewal theory from Theorem 
3 and Corollary 1. Indeed, by (5.3), the stopping time Njj’ is based on the 
thresholding of the sum of the Markov random walk and the nonlinear 
term rjn- Note that 


Vn —>■ ri Pi-a.s. and Eiu^ —> Eiu, 

n— j'OO n—^oo 

and r]n, n>l are slowly changing under Pi. In order to apply Theorem 3 and 
Corollary 1, we have to check the validity of the following three conditions: 

OO 

(8.1) Pi{?7n < —en} < OO for some 0 < e < 7C(P®i,P^°); 

n=l 

(8.2) max \r]n+k\^ n>l, are Pi-uniformly integrable; 

(8.3) lim b ¥il f^ I = 0 for some 0 < e < 1. 

^ ^ fe^oo ^ “ iC(P®i,P'^o) j 

Condition (8.1) holds trivially because r]n > 0. Since r]n, n = 1,2..., are 
nondecreasing, maxo<fc<n IVn+kl = V 2 n and to prove (8.2) it suffices to show 
that r}n, n>l, are Pi-uniformly integrable. Since and, by (5.9), Eit/ < 

OO, the desired uniform integrability follows. Therefore, condition (8.2) is 
satished. 

We now turn to checking condition (8.3). By using Pljr^i > 0, and 0 < 
iC(P®i,P®o) < OO, we will prove that 


(8.4) 
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where i/e > 0 for all e > 0 and 

ai(e,6)=Pi| max §„ > (1 + e)(l — e)6l, 

\^l<n<Ks,h J 

K,^b = (l-e)b/K(P^\F^°). 


(8.5) 


If (8.4) is correct, then the first term on the right-hand side of (8.4) is o(l/6) 
as 6^ oo. All it remains to do is to show that ai(e,6) in (8.5) is o(l/6). 

To this end, by Proposition 2 we can apply Theorem 2 of Fuh and Zhang 
(2000) to have that for all e > 0 and r > 0, 

(8.6) f;n’- ^Pi| max (Sfc — A(P^\P®“)A:) > en| < OO, 

n=l ^ — J 

whenever EijSip < oo and Ei[(Si — A(P®i,IP’^“))''']^"'~^ < oo. Recall that un¬ 
der the conditions of Theorem 6, EijSip < oo, and hence, the sum on the 
left-hand side of the inequality (8.6) is finite for r = 1 and all e > 0, which 
implies that the summand should be o(l/n). Since 

Oii{£,b) < Pi| max (S„ — A'(P®\P®“)n) > e(l — £)b\, 

j 

it follows that ai(e,6) = o(l/6). 

Next, we need to prove (8.4). Denote = logTR^, and let N = Nf for 
simplicity. For any C > 0, by using a change of measure argument, we have 

< (1 — e)bAr(P^\P^'’) ^} = 

^ lEl{l{Ar<(l-e)bX(P®i,P®o)-i, S^<C}® 


>e“^Pi|iV<(l-e)6A:(P®i,P^“)"\ max 

I n<(l-£)6X(P®l,P®0)-l 


^t<C 


> e 


-c 


Pi{A^ < {l-£)bK(F^\¥^<^)-^} 


max 


st>c 


,n<(l-e)bX(P«i,P®o)-i 

Choosing C < (1 -|- e)(l — e)b, we then have 


Pi|ai< 


(l-e)6 


I a:(p®i,p®o) 

< e'^Foc{N < {I - £)bK{r^\F^°)-^} + ai{£,b). 


(8.7) 
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Recall that ii* is defined in (4.13). Note that under the condition 0 < 
< oo, we have 


Ke,b 

Poo{iV < = E ^ 

i=l 


K, 


e,b 




{logBf 


i=l 


B - (r:(p®i,p®o))2s’ 


By choosing a suitable C, we have the first term of (8.7) < for some 

He > 0, and get the proof of (8.4). 

Thus, all conditions of Theorem 3 are satisfied. The use of this theorem 
yields (5.10) for a large h. □ 


Proof of Lemma 7. Note that the parameter A defined in (3.2) is B in 
this case. Since B{w) = Bg{w) is independent of t and g{w) dm{w) < oo, 
ds defined in (3.7) is 0. By (8.1)~(8.3) developed in the proof of Theorem 
6, we can apply Corollary 1, to obtain that as R —> oo, 

^i{Nb,p\Wo = w) 

"" 2Esr,b) - 

+ 0 ( 1 ). 

Here the random variables in (8.8) are the corresponding terms of (2.15) di¬ 
vided by q. We also have (p) ^ (p) ^ Em+§m+ > IEm+§m+ (p) 

IEm+Sm+ and Ap{w) —> A(tc). By using Corollary 1 again, we have 

Ei{N^\Wo = w) 

= (b + - ^m+V - J dm+{w) + A(u})^ + o(l). 

Hence, Lemma 7 is proved. □ 
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